{
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {},
  "cells": [
    {
      "metadata": {},
      "source": [
        "<td>\n",
        "   <a target=\"_blank\" href=\"https://labelbox.com\" ><img src=\"https://labelbox.com/static/images/logo-v4.svg\" width=190/></a>\n",
        "</td>"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "<td>\n",
        "<a href=\"https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/annotation_import/text.ipynb\" target=\"_blank\"><img\n",
        "src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
        "</td>\n",
        "\n",
        "<td>\n",
        "<a href=\"https://github.com/Labelbox/labelbox-python/tree/master/examples/annotation_import/text.ipynb\" target=\"_blank\"><img\n",
        "src=\"https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white\" alt=\"GitHub\"></a>\n",
        "</td>"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Text Annotation Import\n",
        "* This notebook will provide examples of each supported annotation type for text assets, and also cover MAL and Label Import methods.\n",
        "\n",
        "Supported annotations that can be uploaded through the SDK: \n",
        "\n",
        "* Entity\n",
        "* Classification radio \n",
        "* Classification checklist \n",
        "* Classification free-form text \n",
        "\n",
        "\n",
        "**Not** supported:\n",
        "* Relationships\n",
        "* Segmentation mask\n",
        "* Polygon\n",
        "* Bounding box \n",
        "* Polyline\n",
        "* Point \n",
        "\n",
        "MAL and Label Import: \n",
        "\n",
        "* Model-assisted labeling - used to provide pre-annotated data for your labelers. This will enable a reduction in the total amount of time to properly label your assets. Model-assisted labeling does not submit the labels automatically, and will need to be reviewed by a labeler for submission.\n",
        "* Label Import - used to provide ground truth labels. These can in turn be used and compared against prediction labels, or used as benchmarks to see how your labelers are doing.\n",
        "\n",
        "For information on what types of annotations are supported per data type, refer to the Import text annotations [documentation](https://docs.labelbox.com/reference/import-text-annotations)."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "Notes:\n",
        "  * Wait until the import job is complete before opening the Editor to make sure all annotations are imported properly.\n",
        "  * You may need to refresh your browser in order to see the results of the import job."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "### Setup\n"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "!pip install -q \"labelbox[data]\""
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "import labelbox as lb\n",
        "import labelbox.types as lb_types\n",
        "import uuid\n",
        "import json"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Replace with your API key\n",
        "Guides on [Create an API key](https://docs.labelbox.com/docs/create-an-api-key)"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Add your api key\n",
        "API_KEY=\"\"\n",
        "client = lb.Client(API_KEY)\n"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## Supported annotations for text"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "### Supported Python annotation types and NDJSON"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "########## Entities ##########\n",
        "\n",
        "# Python annotation\n",
        "named_entity = lb_types.TextEntity(start=10, end=20)\n",
        "named_entitity_annotation = lb_types.ObjectAnnotation(value=named_entity, name = \"named_entity\")\n",
        "\n",
        "\n",
        "# NDJSON\n",
        "entities_ndjson = { \n",
        "    \"name\": \"named_entity\",\n",
        "    \"location\": { \n",
        "        \"start\": 67, \n",
        "        \"end\": 128 \n",
        "    }\n",
        "}"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "########## Classification - Radio (single choice ) ##########\n",
        "\n",
        "# Python annotation \n",
        "radio_annotation = lb_types.ClassificationAnnotation(\n",
        "    name=\"radio_question\",\n",
        "    value=lb_types.Radio(answer = \n",
        "        lb_types.ClassificationAnswer(name = \"first_radio_answer\")\n",
        "    )\n",
        ")\n",
        "\n",
        "\n",
        "# NDJSON\n",
        "radio_annotation_ndjson = {\n",
        "  \"name\": \"radio_question\",\n",
        "  \"answer\": {\"name\": \"first_radio_answer\"}\n",
        "} "
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "########## Classification - Radio and Checklist (with subclassifications)  ##########\n",
        "\n",
        "nested_radio_annotation = lb_types.ClassificationAnnotation(\n",
        "  name=\"nested_radio_question\",\n",
        "  value=lb_types.Radio(\n",
        "    answer=lb_types.ClassificationAnswer(\n",
        "      name=\"first_radio_answer\",\n",
        "      classifications=[\n",
        "        lb_types.ClassificationAnnotation(\n",
        "          name=\"sub_radio_question\",\n",
        "          value=lb_types.Radio(\n",
        "            answer=lb_types.ClassificationAnswer(\n",
        "              name=\"first_sub_radio_answer\"\n",
        "            )\n",
        "          )\n",
        "        )\n",
        "      ]\n",
        "    )\n",
        "  )\n",
        ")\n",
        "# NDJSON\n",
        "nested_radio_annotation_ndjson= {\n",
        "  \"name\": \"nested_radio_question\",\n",
        "  \"answer\": {\n",
        "      \"name\": \"first_radio_answer\",\n",
        "      \"classifications\": [{\n",
        "          \"name\":\"sub_radio_question\",\n",
        "          \"answer\": { \"name\" : \"first_sub_radio_answer\"}\n",
        "        }]\n",
        "    }\n",
        "}\n",
        "\n",
        "nested_checklist_annotation = lb_types.ClassificationAnnotation(\n",
        "  name=\"nested_checklist_question\",\n",
        "  value=lb_types.Checklist(\n",
        "    answer=[lb_types.ClassificationAnswer(\n",
        "      name=\"first_checklist_answer\",\n",
        "      classifications=[\n",
        "        lb_types.ClassificationAnnotation(\n",
        "          name=\"sub_checklist_question\",\n",
        "          value=lb_types.Checklist(\n",
        "            answer=[lb_types.ClassificationAnswer(\n",
        "            name=\"first_sub_checklist_answer\"\n",
        "          )]\n",
        "        ))\n",
        "      ]\n",
        "    )]\n",
        "  )\n",
        ")\n",
        "nested_checklist_annotation_ndjson = {\n",
        "  \"name\": \"nested_checklist_question\",\n",
        "  \"answer\": [{\n",
        "      \"name\": \"first_checklist_answer\", \n",
        "      \"classifications\" : [\n",
        "        {\n",
        "          \"name\": \"sub_checklist_question\", \n",
        "          \"answer\": {\"name\": \"first_sub_checklist_answer\"}\n",
        "        }          \n",
        "      ]         \n",
        "  }]\n",
        "}"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "########## Classification - Checklist (Multi-choice) ##########\n",
        "\n",
        "# Python annotation\n",
        "checklist_annotation = lb_types.ClassificationAnnotation(\n",
        "    name=\"checklist_question\",\n",
        "    value=lb_types.Checklist(answer = [\n",
        "        lb_types.ClassificationAnswer(name = \"first_checklist_answer\"),\n",
        "        lb_types.ClassificationAnswer(name = \"second_checklist_answer\"),\n",
        "        lb_types.ClassificationAnswer(name = \"third_checklist_answer\")\n",
        "    ])\n",
        "  )\n",
        "\n",
        "\n",
        "# NDJSON\n",
        "checklist_annotation_ndjson = {\n",
        "  \"name\": \"checklist_question\",\n",
        "  \"answer\": [\n",
        "    {\"name\": \"first_checklist_answer\"},\n",
        "    {\"name\": \"second_checklist_answer\"},\n",
        "    {\"name\": \"third_checklist_answer\"},\n",
        "  ]\n",
        "}"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "########## Classification Free-Form text  ##########\n",
        "\n",
        "# Python annotation\n",
        "text_annotation = lb_types.ClassificationAnnotation(\n",
        "    name = \"free_text\", \n",
        "    value = lb_types.Text(answer=\"sample text\")\n",
        ")\n",
        "\n",
        "#  NDJSON\n",
        "text_annotation_ndjson = {\n",
        "  \"name\": \"free_text\",\n",
        "  \"answer\": \"sample text\",\n",
        "}"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "## Upload Annoations - putting it all together "
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "### Step 1: Import data rows into Catalog"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# You can now include ohter fields like attachments, media type and metadata in the data row creation step: https://docs.labelbox.com/reference/text-file   \n",
        "global_key = \"lorem-ipsum.txt\"\n",
        "text_asset = {\n",
        "    \"row_data\": \"https://storage.googleapis.com/labelbox-sample-datasets/nlp/lorem-ipsum.txt\",\n",
        "    \"global_key\": global_key,\n",
        "    \"media_type\": \"TEXT\",\n",
        "    \"attachments\": [{\"type\": \"TEXT_URL\", \"value\": \"https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt\"}]\n",
        "    }\n",
        "\n",
        "dataset = client.create_dataset(\n",
        "    name=\"text_annotation_import_demo_dataset\", \n",
        "    iam_integration=None # Removing this argument will default to the organziation's default iam integration\n",
        ")\n",
        "task = dataset.create_data_rows([text_asset])\n",
        "task.wait_till_done()\n",
        "print(\"Errors:\",task.errors)\n",
        "print(\"Failed data rows:\", task.failed_data_rows)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Step 2:  Create/select an ontology\n",
        "Your project should have the correct ontology setup with all the tools and classifications supported for your annotations, and the tool and classification `name` should match the `name` field in your annotations to ensure the correct feature schemas are matched.\n",
        "\n",
        "For example, when we create the checklist annotation above, we provided the `name` as `checklist_question`. Now, when we setup our ontology, we must ensure that the name of my classification tool is also `checklist_question`. The same alignment must hold true for the other tools and classifications we create in our ontology.\n",
        "\n",
        "[Documentation for reference ](https://docs.labelbox.com/reference/import-text-annotations)"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "## Setup the ontology and link the tools created above.\n",
        "\n",
        "ontology_builder = lb.OntologyBuilder(\n",
        "  classifications=[ # List of Classification objects\n",
        "    lb.Classification( \n",
        "      class_type=lb.Classification.Type.RADIO, \n",
        "      name=\"radio_question\", \n",
        "      options=[lb.Option(value=\"first_radio_answer\")]\n",
        "    ),\n",
        "    lb.Classification( \n",
        "      class_type=lb.Classification.Type.RADIO, \n",
        "      name=\"nested_radio_question\", \n",
        "      options=[\n",
        "        lb.Option(value=\"first_radio_answer\",\n",
        "          options=[\n",
        "              lb.Classification(\n",
        "                class_type=lb.Classification.Type.RADIO,\n",
        "                name=\"sub_radio_question\",\n",
        "                options=[\n",
        "                  lb.Option(value=\"first_sub_radio_answer\")\n",
        "                ]\n",
        "            ),\n",
        "          ]\n",
        "        ),\n",
        "      ], \n",
        "    ),\n",
        "     lb.Classification(\n",
        "      class_type=lb.Classification.Type.CHECKLIST,\n",
        "      name=\"nested_checklist_question\",\n",
        "      options=[\n",
        "          lb.Option(\"first_checklist_answer\",\n",
        "            options=[\n",
        "              lb.Classification(\n",
        "                  class_type=lb.Classification.Type.CHECKLIST,\n",
        "                  name=\"sub_checklist_question\", \n",
        "                  options=[lb.Option(\"first_sub_checklist_answer\")]\n",
        "              )\n",
        "          ]\n",
        "        )\n",
        "      ]\n",
        "    ),\n",
        "    lb.Classification( \n",
        "      class_type=lb.Classification.Type.CHECKLIST, \n",
        "      name=\"checklist_question\", \n",
        "      options=[\n",
        "        lb.Option(value=\"first_checklist_answer\"),\n",
        "        lb.Option(value=\"second_checklist_answer\"), \n",
        "        lb.Option(value=\"third_checklist_answer\")            \n",
        "      ]\n",
        "    ), \n",
        "     lb.Classification( # Text classification given the name \"text\"\n",
        "      class_type=lb.Classification.Type.TEXT,\n",
        "      name=\"free_text\"\n",
        "    )\n",
        "  ],\n",
        "  tools=[ # List of Tool objects\n",
        "         lb.Tool(\n",
        "            tool=lb.Tool.Type.NER, \n",
        "            name=\"named_entity\"\n",
        "          ),\n",
        "    ]\n",
        ")\n",
        "\n",
        "ontology = client.create_ontology(\"Ontology Text Annotations\", ontology_builder.asdict())\n"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Step 3: Create a labeling project \n",
        "Connect the ontology to the labeling project "
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Project defaults to batch mode with benchmark quality settings if this argument is not provided\n",
        "# Queue mode will be deprecated once dataset mode is deprecated\n",
        "\n",
        "project = client.create_project(name=\"Text Annotation Import Demo\",\n",
        "                                    media_type=lb.MediaType.Text)\n",
        "\n",
        "\n",
        "project.setup_editor(ontology)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Step 4: Send a batch of data rows to the project "
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Setup Batches and Ontology\n",
        "\n",
        "# Create a batch to send to your MAL project\n",
        "batch = project.create_batch(\n",
        "  \"first-batch-text-demo\", # Each batch in a project must have a unique name\n",
        "  global_keys=[global_key], # Paginated collection of data row objects, list of data row ids or global keys\n",
        "  priority=5 # priority between 1(Highest) - 5(lowest)\n",
        ")\n",
        "\n",
        "print(\"Batch: \", batch)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Step 5: Create the annotations payload\n",
        "\n",
        "Create the annotations payload using the snippets of code above\n",
        "\n",
        "Labelbox support two formats for the annotations payload: NDJSON and Python Annotation types. Both are described below. If you are using Python Annotation types, compose your annotations into Labels attached to the data rows."
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "#### Python annotations"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Create a Label\n",
        "labels = []\n",
        "labels.append(\n",
        "    lb_types.Label(\n",
        "        data=lb_types.TextData(\n",
        "            global_key=global_key),\n",
        "        annotations = [\n",
        "            named_entitity_annotation, \n",
        "            radio_annotation, \n",
        "            checklist_annotation, \n",
        "            text_annotation,\n",
        "            nested_checklist_annotation,\n",
        "            nested_radio_annotation\n",
        "        ]\n",
        "    )\n",
        ")"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "#### NDJSON annotations"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "label_ndjson = []\n",
        "for annotations in [entities_ndjson, \n",
        "                   radio_annotation_ndjson,  \n",
        "                   checklist_annotation_ndjson,\n",
        "                   text_annotation_ndjson,\n",
        "                   nested_radio_annotation_ndjson,\n",
        "                   nested_checklist_annotation_ndjson,\n",
        "                    ] :\n",
        "  annotations.update({\n",
        "      \"dataRow\": { \"globalKey\": global_key }\n",
        "  })                   \n",
        "  label_ndjson.append(annotations)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Step 6: Upload annotations to a project as pre-labels or ground truth\n",
        "For the purpose of this tutorial only import one of the annotations payloads at the time (NDJSON or Python Annotation types). \n",
        "\n",
        "\n"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "#### Model-Assisted Labeling (MAL)"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Upload MAL label for this data row in project\n",
        "upload_job_mal = lb.MALPredictionImport.create_from_objects(\n",
        "    client = client, \n",
        "    project_id = project.uid, \n",
        "    name=\"mal_import_job\"+str(uuid.uuid4()), \n",
        "    predictions=labels)\n",
        "\n",
        "upload_job_mal.wait_until_done()\n",
        "print(\"Errors:\", upload_job_mal.errors)\n",
        "print(\"Status of uploads: \", upload_job_mal.statuses)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "#### Label Import "
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# Upload label for this data row in project \n",
        "upload_job_label_import = lb.LabelImport.create_from_objects(\n",
        "     client = client, \n",
        "     project_id = project.uid, \n",
        "     name=\"label_import_job\"+str(uuid.uuid4()),  \n",
        "     labels=labels)\n",
        "\n",
        "upload_job_label_import.wait_until_done()\n",
        "print(\"Errors:\", upload_job_label_import.errors)\n",
        "print(\"Status of uploads: \", upload_job_label_import.statuses)"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    },
    {
      "metadata": {},
      "source": [
        "### Optional deletions for cleanup"
      ],
      "cell_type": "markdown"
    },
    {
      "metadata": {},
      "source": [
        "# project.delete()\n",
        "# dataset.delete()"
      ],
      "cell_type": "code",
      "outputs": [],
      "execution_count": null
    }
  ]
}